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Abstract — The observations in many applications consist of 
counts of discrete events, such as photons hitting a detector, 
which cannot be effectively modeled using an additive bounded or 
Gaussian noise model, and instead require a Poisson noise model. 
As a result, accurate reconstruction of a spatially or temporally 
distributed phenomenon (/*) from Poisson data {y) cannot be 
effectively accomplished by minimizing a conventional penalized 
least-squares objective function. The problem addressed in this 
paper is the estimation of /* from y in an inverse problem 
setting, where (a) the number of unknowns may potentially be 
larger than the number of observations and (b) /* admits a 
sparse approximation. The optimization formulation considered 
in this paper uses a penalized negative Poisson log-likelihood 
objective function with nonnegativity constraints (since Poisson 
intensities are naturally nonnegative). In particular, the proposed 
approach incorporates key ideas of using separable quadratic 
approximations to the objective function at each iteration and 
penalization terms related to £i norms of coefficient vectors, total 
variation seminorms, and partition-based multiscale estimation 
methods. 

Index Terms — Photon-limited imaging, Poisson noise, wavelets, 
convex optimization, sparse approximation, compressed sensing, 
multiscale, total variation 



I. Introduction 

IN a variety of applications, ranging from nuclear medicine 
to night vision and from astronomy to traffic analysis, data 
are collected by counting a series of discrete events, such as 
photons hitting a detector or vehicles passing a sensor. The 
measurements are often inherently noisy due to low count 
levels, and we wish to reconstruct salient features of the 
underlying phenomenon from these noisy measurements as 
accurately as possible. The inhomogeneous Poisson process 
model [1] has been used effectively in many such contexts. 
Under the Poisson assumption, we can write our observation 
model as 

y ^ Poisson(A/'^), (1) 

where f"" G W^ is the signal or image of interest, A G R!J!^'' 
linearly projects the scene onto an m-dimensional set of 
observations, and y G TL^ is a length-m vector of observed 
photon counts. 

The problem addressed in this paper is the estimation 
of P from y when /^ is sparse or compressible in some 
basis W (i.e., f'^ = WO^ and 0^ admits an accurate sparse 
approximation) and the number of unknowns n may be larger 
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than the number of observations m. This challenging problem 
has clear connections to compressed sensing (CS) [2], [3], [4], 
[5], but arises in a number of other settings as well, such as to- 
mographic reconstruction in nuclear medicine, superresolution 
image reconstruction in astronomy, and deblurring in confocal 
microscopy. In recent work [6], [7], we explored some of 
the theoretical challenges associated with CS in a Poisson 
noise setting, and in particular highlighted two key differences 
between the conventional CS problem and the Poisson CS 
problem: 

• unlike many sensing matrices in the CS literature, the 
matrix A must contain all nonnegative elements, and 

• the intensity /^ is nonnegative, and hence any estimate / 
thereof must also bejionnegative. 

The nonnegativity of / and A results in challenging op- 
timization problems. In particular, the restriction that / is 
nonnegative introduces a set of inequality constraints into the 
minimization setup; as shown in [8], [9], these constraints are 
simple to satisfy when / is sparse in the canonical basis, but 
they introduce significant challenges when enforcing sparsity 
in an arbitrary basis. 

A. Problem formulation 

Under the Poisson model (1), the probability of observing 
a particular vector of counts y is given by 



p(yi^r) = n 



Vi 



■exp(-ef^r) 



(2) 



where e^ is the i canonical basis unit vector. Thus the 
negative Poisson log-likelihood is given by 

m 

F{f) = l^Af -Y^Vi ^og{eJ Af), (3) 

where 1 is an m- vector of ones. (Here we neglect \og{yi\) 
terms since they are constant with respect to / and hence do 
not impact our objective.) We will see later that in order to 
avoid the singularity at / = 0, it is advantageous to introduce 
a small parameter /3 > where typically /3 <C 1: 



F(f) ^ l^Af -J^Vi ^ogiefAf + p). 



(4) 



Similar techniques are used in [10] in the form of known 
background intensities, where the observations are modeled by 
y ~ Poisson (A/^ + 6) with h G R!f: known a priori. If min(6) 
is large enough /3 is not explicitly needed, however in our 
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experiments we assume b = and use /3 = 1 x 10~^^, much 
smaller than any reasonable background count magnitude. This 
small parameter will also appear in the gradient 



VF(/) = A^l 



E 



Vi 



and the Hessian 



V^Fif) = A^ 



m 

E 



'jAf 



Vi 



■P 



-.A^e 



iejAf + py 



T 



A. 



(5) 



(6) 



Our Poisson reconstruction algorithms take the form of the 
following constrained optimization problem: 



minimize ^(f) 



F(/)+rpen(/) 



(7) 



R is the negative Poisson log-likelihood in 



and 



subject to / > 0, 

where 

. F : R^ 
(4), and 

• pen : R"^ ^ R is a finite, usually nonsmooth, 
potentially nonconvex penalty functional. 
In this paper, we will consider several variants of the penalty 
term, including 1 1 / 1 1 1 , 1 1 W^f 1 1 1 for some arbitrary orthonormal 
basis W, a total-variation seminorm ||/||tv, and a complexity 
regularization penalty based upon recursive dyadic partitions. 
We refer to our approach as SPIRAL (Sparse Poisson Intensity 
Reconstruction ALgorithm). 

B. Related work 

Various regularization techniques are often employed to 
compensate for the ill-posedness of the estimation problem. 
Outside the Poisson context, for example in the presence of 
additive white Gaussian noise, regularization methods based 
on wavelet or curvelet sparsity [5], models of wavelets' 
clustering and persistence properties [11], and a variety of 
other penalties (cf. [12]) have proven successful. 

The key challenge in Poisson intensity estimation problems 
is that the mean and the variance of the observed counts 
are the same. As a result, conventional approaches based on 
a penalized least-squares objective function (cf. [12], [13]) 
will yield suboptimal results when applied to Poisson data 
with low intensities. Variance-stabilizing transforms (VSTs), 
such as the Anscombe transform [14] and the Haar-Fisz [15], 
[16] transform, are widely used to address this issue and to 
approximate the Poisson observations by Gaussian random 
variables [17], [18]. Jansen proposes a wavelet based Poisson 
estimation method based on data-adaptive VSTs and Bayesian 
priors on the stabilized coefficients [19]. However, as pointed 
out in [20], [21], such approximations are inaccurate when 
the observed number of photons per pixel or voxel is very 
low and tend to oversmooth the resulting estimate. In a more 
recent work, Zhang et al. [22] propose a multiscale variance- 
stabilizing transform (MSVST) which applies a VST to the 
empirical wavelet, ridgelet or curvelet transform coefficients. 
However, theoretical analysis of this approach is not available 
and it is not clear how to extend the MSVST to Poisson inverse 
problems. 



Several authors have investigated reconstruction algorithms 
specifically designed for Poisson noise without the need for 
VSTs. In [23], [24], Nowak and Kolaczyk describe mul- 
tiscale Bayesian approach in an Expectation-Maximization 
(EM) framework to reconstruct Poisson intensities. In their 
seminal paper [25], Kolaczyk and Nowak present a multi- 
scale framework for likelihoods similar to wavelet analysis 
and propose a denoising algorithm based on the complexity- 
penalized likelihood estimation (CPLE). The CPLE objective 
function is minimax optimal over a wide range of isotropic 
likelihood models. There are several variants of the CPLE 
method depending upon the nature of the image or signal being 
denoised [26], [27], [28]. 

Regularization based on a total variation (TV) seminorm 
has also garnered significant recent attention (cf. [29], [30]). 
This seminorm is described in detail below; in general, it 
measures how much an image varies across pixels, so that a 
highly textured or noisy image will have a large TV seminorm, 
while a smooth or piecewise constant image would have a 
relatively small TV seminorm. This is often a useful alternative 
to wavelet-based regularizers, which are also designed to be 
small for piecewise smooth images but can result in spurious 
large, isolated wavelet coefficients and related image artifacts. 

In the context of Poisson inverse problems, however, adap- 
tation of these regularization methods can be challenging for 
two main reasons. First, the negative Poisson log-likelihood 
used in the formulation of an objective function often re- 
quires the application of relatively sophisticated optimization 
theory principles. Second, because Poisson intensities are 
inherently nonnegative, the resulting optimization problem 
must be solved over a constrained feasible set, increasing the 
complexity of most algorithms. Some recent headway has been 
made using multiscale or smoothness-based penalties [26], 
[31], [32]. In one recent work [33], the Poisson statistical 
model is bypassed in favor of an additive Gaussian noise 
model through the use of the Anscombe variance stabilizing 
transform. This statistical simplification is not without cost, as 
the linear projections of the scene must now be characterized 
as nonlinear observations. Other recent efforts [34], [35] solve 
Poisson image reconstruction problems with TV seminorm 
regularization using a split Bregman approach [36], but the 
proposed methods involves a matrix inverse operation which 
can be extremely difficult to compute for large problems 
outside of deconvolution settings. TV seminorm regularization 
is also explored in the context of a Richardson-Lucy algorithm 
[37], but nonnegativity and convergence were not explicitly 
addressed. Finally, the approaches in [38], [39] apply proximal 
functions to solve more general constrained convex minimiza- 
tion problems. These methods use projection to obtain feasible 
iterates (i.e., nonnegative intensity values), which may be 
difficult for recovering signals that are sparse in a noncanonical 
basis. 



C Contributions of the proposed method 

In the proposed work, we present a general algorithmic 
framework for solving Poisson inverse problems. This frame- 
work requires no special structure in the sensing matrix A, 
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yet can take advantage fast matrix- vector multiplications when 
available. The success of this approach hinges on the ability 
to solve certain constrained subproblems quickly. We show 
that this can be done for a variety of regularization schemes. 
Additionally, we consider a step selection procedure that is 
better suited for the Poisson log-likelihood versus the corre- 
sponding generalizations from the Gaussian log-likelihood. In 
the subsequent analysis, we establish global convergence of 
the constrained optimization under a set of mild assumptions 
which are easily satisfied in practice. We establish properties of 
the solution set, which yield conditions for a unique solution to 
the minimization. Our approach is then supported by numerical 
simulations which show state-of-the-art performance on a 
simulated limited-angle emission tomography inverse problem. 



Hessian. Since F is convex, this guarantees a/e > 0, even 
without the safeguard. The traditional BB scheme chooses 
o^k = (7^)^^Vll^^ll2' where 7^ = VF(/^) - VF(/^-i). 
When applied to a quadratic objective, such as arising from 
considering the Gaussian log-likelihood, these two schemes 
are identical. However, when applied to the Poisson log- 
likelihood, our choice is more strongly effected by the cur- 
vature at the current iteration. Our modified BB choice is 
no more expensive to compute than the original BB scheme 
since AS^ and Af^ are already available from the gradient 
computations, and computing a^ in (12) is a simple sequence 
of 0(n) operations. 

This initial choice of ak is used if the resulting solution of 
(10) satisfies the acceptance criteria 



n. Algorithms 

Our approach to solve the minimization problem (7) em- 
ploys sequential quadratic approximations to the Poisson log- 
likelihood F{f). More specifically, at iteration k we compute 
a separable quadratic approximation to F{f) using its second- 
order Taylor series approximation at /^; this approximation is 
denoted F^{f). The next iterate is then given by 

/^+i = argminF^(/)+rpen(/) 

/GR- (8) 

subject to / > 0. 

Similar to the framework described in [12], F^ is a second- 
order Taylor series approximation to F with the Hessian 
V^F(/^) approximated by a scaled identity matrix a^I, with 
ak > 0. This yields 

F'if) = F{f) + (/ - f'^fVFif'^) + f 11/ - f'^Wl (9) 

With this separable approximation, simple manipulation of (8) 
yields a sequence of subproblems of the form 



Hfn < 



f^' = argmin ^^{f) = ^||/ - 5^||^ + ^Pen(/) 



/GR- 

subject to / > 0, 



where 



s' = f- ^VF(/^). 



(10) 



(11) 



This formulation has the benefit of being easily recognized 
as a nonnegatively constrained £2 denoising of the gradient 
descent step s^. 

The parameter ak is chosen via a sequence of two repeated 
steps. First, a modified Barzilai-Borwein (BB) method [40] is 
used to choose the initial value of ak- With S^ = f^ — /^~^, 
we initially choose 

{S'fV'F{f^)5' _ \\^-{AS')/{Af^ml 



c^k 



\mi 



(12) 

with ak safeguarded to be within the range [amin, <^max]- 
Here ^/~^, •, and / are understood as component- wise. This 
method allows our separable approximation (9) to capture the 
curvature of the Poisson log-likelihood F{f) along the most 
recent step S^, in the vicinity of the current iterate /^. 

The astute reader will recognize (12) as a Rayleigh quotient, 
and hence always has a value within the spectrum of the 



max 

--[k-M] + , 



Hf)-^\\f 



/c+1 



/ 



k\\2 

25 



(13) 



where M is a nonnegative integer, a G (0, 1) is a small 
constant, and the operation [ • ]+ = max{0, • }. If it fails this 
acceptance criteria, ak is repeatedly increased by a factor r] 
until the solution to (10) satisfies (13). This gentle criteria 
allows the nonmonotonic objective behavior characteristic of 
the Barzilai-Borwein methods [12], [40], yet enforces that 
the next iterate have a slightly smaller objective than the 
largest value over the past M iterations. Note that choosing 
M = results in a purely monotonic algorithm. For clarity, 
we describe our general procedure in Algorithm 1. 

Algorithm 1 Sparse Poisson Intensity 
Reconstruction ALgorithm (SPIRAL) 

1: Initialize Choose 7^ > 1, a G (0, 1), M e Z+, < amm < 
amax, and initial solution /^. Start iteration counter k ^ 0. 
2: repeat 

3: choose ak G [amin,Q^max] by (12) 
4: /^+^ ^ solution of (10) 
5: while /^^^ does not satisfy (13) do 

c^k ^ V^k 

y/c+i ^ solution of (10) 

end while 

9: k ^ k^l 

10: until stopping criterion is satisfied 



A. Canonical basis with sparsity penalty 

When pen(/) = ||/||i, the minimization subproblem (10) 
has the following analytic solution: 



/ 



fc+i _ 



^1 

OLk 



with [•]+ acting component- wise. Thus solving (7) subject 
to nonnegativity constraints with an £1 penalty function mea- 
suring sparsity in the canonical basis is straightforward. An 
alternative algorithm for solving this Poisson inverse problem 
with sparsity in the canonical basis was also explored in the 
recent literature [8]. 
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B. Non-canonical basis 

Now suppose that the signal of interest is sparse in some 
other basis. Then the ^i penalty term is given by 



where 



pen(/) ^ WW^fh = \\e\\ 



e^w'f 



(14) 



for some orthonormal basis W. When the reconstruction / = 
W9 must be nonnegative (i.e., WO > 0), the minimization 
problem 

^^+1 ^ argmin ^\e) ^ ^||^ - ^^1^ + ^||^||i, 

subject to We>{) (15) 

no longer has an analytic solution necessarily. We can solve 
this minimization problem by solving its Lagrangian dual. 
First, we reformulate (15) so that its objective function (})^{6) 
is differentiable by defining u^v >{) such that = u — v. The 
minimization problem (15) becomes 

{u^^\v^^^) ^ argmin l\\u - v - s^g ^ ^l^{u ^ v) 
subject to u,v> 0, W{u -v)>0, (16) 

which has twice as many parameters and has additional 
nonnegativity constraints on the new parameters, but now has 
a differentiable objective function. The Lagrangian function 
corresponding to (16) is given by 

-Xju-X^v-XlW{u-v), 

where Ai,A2,A3 G R^ are the Lagrange multipliers corre- 
sponding to the constraints in (16). Setting the derivative of 
^ with respect to u and v to zero, we obtain 

u-v = s^ ^Xi-^1^ W^Xs, and (17) 

which leads to the Lagrangian dual function 

5(Ai, As) = -^Ws" + Ai - ^1 + W^\,\\l + ^llslli 

We define 7 = Ai — ^1. For the Lagrange dual problem 
corresponding to (16), the Lagrange multipliers Ai,A2,A3 > 
0. Since A2 = —1 - Ai, then -^1 < 7 < ^1. Also, 
let A = A3. The Lagrange dual problem associated with this 
problem is thus given by 



minimize h{X,^) ^ l\\s'^ + ^ + W^Xg - '^\\sX 



subject to A>0, -^1 < 7 < ^1 



(18) 



and at the optimal values 7^ and A^, the primal iterate 0^~^^ 
is given by 



3fc+l A ,,fe + l _ ,,fc+l 



5^+7^ + WA\ 



We note that the minimizers of the primal problem (16) and its 
dual (18) satisfy 0^(6>^+^) = -/i(7^,A^) since (16) satisfies 



(a weakened) Slater's condition [41]. In addition, the function 
— /i(7, A) is a lower bound on (j)^{0) at any dual feasible point. 
The objective function of (18) can be minimized by alter- 
natingly solving for A and 7, which is accomplished by taking 
the partial derivatives of /i(A, 7) and setting them to zero. Each 
component is then constrained to satisfy the bounds in (18). At 
the j*^ iteration, the variables can, thus, be defined as follows: 






1-^ 

1^ Oik 

v(s''- 



1, 



T^^A^-i) 






U) 



(19) 



where the operator mid{a, 6, c} chooses the middle value of 
the three arguments component- wise. Note that at the end of 
each iteration j, the approximate solution 0^^"^ = s^ -\- j^^") -\- 
W^X^^"^ to (15) is feasible with respect to the constraint WO > 
0: 



we^^^ 



Ws^ + Wj^^^ + A^^') 



W(5^+7^^^)+ -W(5^+7^^^) 



M)) 



W 



(.^+7^^'^) 



(20) 



> 0. 



Thus, we can terminate the iterations for the dual problem 
early and still obtain a feasible point. By solving the subprob- 
lem "loosely" (see Sec. IH-C), the algorithm can converge 
faster but to a potentially less accurate estimate. In Sec. IV, 
we show results from both "loose" and "tight" criteria for 
solving the subproblem. We call this approach SPIRAL-^i. 

C Total variation penalty 

We also consider an approach where the TV seminorm is 
used as a penalization scheme, i.e., pen(/) = ||/||tv- We 
define the anisotropic TV seminorm as 



llTV 



y/n—1 y/n 
k=l 1=1 



k,l 



//e+l,d 



V^ Vn— 1 

EEi/' 



k,l 



■ fk,l + l\ 



where - for simplicity of presentation, not algorithmic neces- 
sity - we use a slight abuse of notation by using 2D pixel 
indices instead of vector indices by assuming that / G W^ is 
a square ^/n x ^/n image. This highlights the fact that the TV 
seminorm is simply a measure of the magnitude of all vertical 
and horizontal first-order differences. Said differently, one can 
think of an image with a small TV seminorm as one that is 
sparse with respect to an overcomplete representation of these 
first-order differences (neglecting the mean of /), meaning 
the image has few abrupt changes in pixel intensity yet many 
regions of homogeneous signal level. This property makes 
TV especially well suited for image denoising and inverse 
problems. 

Recent work by Beck and Teboulle [30] presents a fast 
computational method for solving the TV-regularized problem 



/ = arg min 
subject to 



IWAf- 



■b\\l 



All/ll 



TV 



(21) 
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where A > is a tuning parameter, C is a closed convex set 
and A is a linear, spatially invariant blur operator. This method 
utilizes a gradient-based optimization approach founded on 
a monotone iterative shrinkage and thresholding algorithm. 



When we choose f = f 



/c+l 



/, b 



A = r/a/e. 



and C = {/ G R^ : / > 0}, (21) then reduces to an £2 
denoising of s^ with a total variation regularizer: 



/ 



fc+i _ 



argmin |||/ - 
subject to / > 0, 



^'^lli 



IITV 



(22) 



precisely the form (10) required for use in our algorithmic 
framework. We call this approach with a total variation penalty 
SPIRAL-TV. 

D. Partition-based methods 

An alternative to the £1- and TV-norm penalties can be 
formulated using model-based estimates that utilize structure 
in the coefficients beyond that of a sparse representation. In 
particular, we build upon the framework of recursive dyadic 
partitions (RDP), which we summarize here and are described 
in detail in [26]. Let V be the class of all recursive dyadic 
partitions of [0,1]^ where each cell in the partition has a 
sidelength at least \l \fn, and let P G 7^ be a candidate 
partition. The intensity on P, denoted /(P), is calculated 
using a nonnegative least-squares method to fit a model (such 
as a constant) to s^ in (15) in each cell in the RDP. As 
an example, consider Fig. 1. Here we approximate the true 
image (Fig. 1(a)) on the recursive dyadic partition defined in 
Fig. 1(b)). The result is a piecewise constant approximation to 
the emission image (Fig. 1(c)). We see that the partition model 
is able to accurately capture the image in clear multiresolution 
fashion: large homogeneous regions are well-modeled by large 
cells, whereas edges are approximated via the deeper recursive 
partitioning. Furthermore, a penalty can be assigned to the 
resulting estimator which is proportional to |P|, the number 
of cells in P. Thus we set 



p/c+l 



argmin f||s- -/(P)||2- 
Pep 



(23) 



^(pfe+i). 



A search over V can be computed quickly using a dynamic 
program. When using constant partition cell models, the non- 
negative least-squares fits can be computed non-iteratively in 
each cell by simply using the maximum of the average of s^ in 
that cell and zero. Because of this, enforcing the constraints is 
trivial and can be accomplished very quickly. The disadvantage 
of using constant model fits is that it yields piecewise constant 
estimates. However, a cycle- spun translation-invariant (TI) 
version of this approach [26] can be implemented with high 
computational efficiency and can be used for solving this 
nonnegative regularized least- squares subproblem that results 
in a much smoother estimator. We refer to these approaches 
as SPIRAL-RDP and SPIRAL-RDP-TI. 

It can be shown that partition-based denoising methods 
such as this are closely related to Haar wavelet denoising 
with an important hereditary constraint placed on the thresh- 
olded coefficients - if a parent coefficient is thresholded. 



then its children coefficients must also be thresholded [26]. 
This constraint is akin to wavelet-tree ideas which exploit 
persistence of significant wavelet coefficients across scales 
and have recently been shown highly useful in compressed 
sensing settings [11]. Since in this context the penalty can 
be thought of an £^ measure, the resulting RDP-based penalty 
function is not a convex function. Hence we can only guarantee 
convergence to a local minimizer. Given a sufficiently accurate 
initialization, the RDP-based method performs competitively 
in both speed and accuracy to convex penalties, for which 
global convergence is assured. 
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Fig. 1: Example of a partition-based approximation, (a) True 
image, (b) Recursive dyadic partition (RDP). (c) RDP-based 
approximation of the true image. 



III. Algorithmic details 

A. Computational complexity and nonmonotonicity 

It should be noted that in determining an acceptable next 
iterate /^^^ in (8), the main computational burden is often 
checking the acceptance criteria (13) as testing each candidate 
solution involves recomputing Aj^^^ . In the worst case, A 
may be a dense unstructured matrix for which computing 
matrix-vector products is costly. Although enforcing near- 
monotonicity of the objective aids convergence to a more 
accurate solution to the reconstruction problem (7), significant 
demonstrable gains in computational speed may be achieved 
by forgoing this criteria and simply accepting the choice (12). 
In this case, an efficient implementation of our algorithm only 
requires two matrix-vector multiplications involving A per 
iteration, one computing Aj^ for defining a/^ in (12), and the 
other using A^ in the computation of \/F{f^) for defining the 
gradient descent result s^ in (11). Since (10) requires only ak 
and s^, there are no matrix- vector multiplications involving A 
in the denoising subproblem. 



B. Initialization 

While global convergence proofs - such as the one in 
Section III-D - guarantee convergence for any initial point 
/^ > 0, the choice of initialization is an important practical 
consideration. Iterative algorithms are rarely allowed to exe- 
cute long enough to converge to an optimal solution, meaning 
the approximate solution may be strongly dependent on the 
starting point of the algorithm. Further, nonconvex penalties 
such as our RDP-based penalization scheme, may introduce 
local optima that are difficult to avoid if the initialization is 
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chosen poorly. Although in both these cases a better initial- 
ization typically yields better performance, it is undesirable to 
spend significant computational resources in doing so. 

In practice, one approach is to initialize with an 
appropriately- scaled A^y. We have found this approach par- 
ticularly effective in compressive sensing contexts where the 
sensing matrix A acts as a near-isometry. In many appli- 
cations, a Fourier-based inversion scheme often leads to an 
effective initialization. In the emission tomography example 
we consider in Sec. IV, a filtered back-projection estimate pro- 
vides a sufficiently good initialization with low computational 
cost. In particular, the initialization we use in the numerical 
experiments results from two iterations of a non-convergent 
version of the EPL-INC-3 algorithm, initialized by the filtered 
back-projection estimate. The EPL-INC-3 method employs an 
incremental penalized Poisson likelihood EM algorithm with 
an image roughness penalty, more details are available in 
Section IV-B. 



C Termination criteria 

In this section, we list criteria by which SPIRAL decides 
whether the iterates in the subproblem (10) and for the 
main problem (7) are acceptable approximations to the true 
minimizers to terminate the algorithm. Here, we only provide 
criteria for the SPIRAL-^i subproblem since the global solu- 
tion to the partition-based SPIRAL subproblem (23) can be 
easily and exactly obtained using a non-iterative tree-pruning 
algorithm [26], even though its objective function is nonconvex 
(due to the nonconvex penalty). For the TV-based method, we 
use the standard convergence criteria implemented by Beck 
and Teboulle [30]. 

1) SPIRAL- li subproblem: The criterion for termination for 
the SPIRAL-£i subproblem measures the duality gap. Since 
the objective function (j)^{0) in (15) is convex and all the 
constraints are affine, then (a weaker) Slater's condition holds 
[41] and, therefore, the duality gap is zero, i.e., 

where (7^,A^) solves (18). Recall that, at the j*^ iterate, 

qU) — gk j^ ^{j) j^ ^/'Ty^{j) ^^^ ^^ viewed as an approximate 

solution to 6>^+^ Thus, the SPIRAL approach for the ^i 
subproblem will consider the iterates to be sufficiently close 
to the optimal solution if 



|0^(6>(^")) + /i(7(^"),A(^"))| 
WW)\ 



< tOlc 



where tolsuB > is some small constant. In our numerical 
experiments, we often found that it is not necessary to solve 
this subproblem very accurately, especially at the beginning of 
the algorithm where the iterate 9^ is still far from the optimal 
solution. 

2) SPIRAL: Since the global minimizer /"^ of (7) is not 
known a priori, criteria to terminate the SPIRAL algorithm 
must be established to determine whether a computed mini- 
mizer / is an acceptable solution. We list two such criteria. 
The first of these criteria is simple: terminate if consecutive 



iterates or the corresponding objective values do not change 
significantly, i.e., 

ll/'+'-/'l|2/||/'=||2<tolP 



(24) 



or 



!*(/'+') -*(/')l/l*(/')l<toip, 



where to IP is a small positive constant. The advantage of 
these criteria is that they apply to general penalty functions. 
The disadvantage, however, is that it is possible that the 
change between two consecutive iterates may be small or 
that they result in only small improvements in the objective 
function even though iterates are still far from the true solution. 
However, we have yet to observe this premature termination 
in practice. 

The next criterion applies only to SPIRAL-^i, where the 
penalty is convex and, after a change of variables, differ- 
entiable. This criterion is based on the Karush-Kuhn-Tucker 
(KKT) conditions for optimality: at the k^^ iteration, given 
0^ and the corresponding Lagrange multipliers A^ computed 
from (18), we determine whether 



||V<^((9^)-W^A^||2 < tolP, 



(25) 



The left hand side corresponds to the gradient of the La- 
grangian function, which criterion (25) forces to be suffi- 
ciently close to zero. A complementarity condition could also 
be required, but by construction, it is always satisfied, i.e., 
{X^)^WO^ = using (19) and (20). 

D. Convergence proof 

In this section we consider the minimization problem 



minimize ^(/) — F{f) + rpen(/) 
subject to / > 0, 



(26) 



where F is the negative Poisson log-likelihood defined in (3). 
To ease analysis, we consider the equivalent problem of 



mimmizeF(/) + p(/), 



(27) 



where p : R^ ^ R = R U {— oo, oo} is defined to be 

p{f)=Tpen{f) + 5+{f), (28) 

with (5+ = d]^ri being the indicator function of the nonnegative 
orthant (our feasible set): 



^+(/) 



if / > 0, 
oo otherwise. 



(29) 



For notational simplicity, we still refer to the objective as 
^(/) = F{f) -\- p{f). Considering the constraints in this 
manner allows one to avoid explicit examination of the 
convergence of the Karush-Kuhn-Tucker (KKT) conditions, 
and only consider subgradient calculus. For more details, see 
Rockafellar and Wets [42]. 

We make the following mild assumptions: 

• (Al) F is proper convex (i.e., F is convex with F{f) > 

— oo for all / and F{f) < oo for some /) and Lipschitz 

continuously differentiable on R!f:, 
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• (A2) p is proper convex and continuous on R!J:, 

• (A3) ^ is coercive (i.e., lim||j||^oo ^(/) = co), 

We show that the assumption (Al) is satisfied by the 
negative Poisson log-likeHhood through the following lemma. 

Lemma 1. The negative Poisson log-likelihood with parame- 
ter ^ > {) is Lipschitz continuously differentiable on W^ with 
Lips chit z constant 



L<'^\\A\\l 



max(v) / .T^^N / .^N 

< — -^max(A^l)max(Al). 



Proof: Clearly the Hessian of F (6) is positive semidef- 
inite; therefore all that is required is a bound on the largest 
eigenvalue of V^F(/) over the feasible set. That is, we need 
to bound 

Amax = sup sup Z^V'^F{f)z. 

/>0||^||2<1 

Since A is nonnegative, the supremum over / > is attained at 
/ = 0, as this minimizes the denominator of the fraction in (6). 
Therefore we simply need to bound the largest eigenvalue of 
V^F(O) = -^A^T>'mg{y)A. Using properties of matrix norms, 
we have 



^max — 



1 

max(^) 

^^ 

max(7/) 



^^Diag(2/M||2<^||A||2||Diag(y)||2 



< 



/32 

max(^) 



Ah\\A\\ 



max(A^l)max(Al). 



Since a bound on the largest eigenvalue of the Hessian is the 
same as a bound on the Lipschitz constant of the gradient, this 
completes the proof. ■ 

We present the latter bound in Lemma 1 since the matrix 
A is often such that computing ||A||2 may be difficult, yet 
matrix-vector products with A and A^ can be computed at 
far lower cost. 

Before launching into the details of our convergence proof, 
let us recall a few facts. A point / is said to be critical for 
(27) if 

OGS$(/) = VF(/) + Sp(/), 

where dg{f) denotes the subdifferential (set of all subgra- 
dients) of ^ at / [42]. A minimizer / and corresponding 
minimum ^{f) are guaranteed to exist for <l> coercive, proper 
convex, and lower semicontinuous, with criticality of / suf- 
ficient for / to optimize (27). In solving (27), we generate a 
sequence of iterates {/^}fcGZ+' which due to the nonnegativity 
constraint, we have /^ > for all /c G Z+. 

We also simplify notation. Recall 5^^^ — f^~^^ — /^, and 
let l{k) = argmax^^[;._^] ;.^(/*), i.e., the index where 
^ is largest over the past M iterations. Then (13) is more 
simply written as 



^(/^+i) < $(/^(^)) 



Tll^'^'ll'. 



(30) 



It can be shown that the sequence $(/^(^)) is nonincreasing, 
and further that ^{f) = ^{f^^^) > ^{f^^^) > ^{f). 
Hence from assumption (A3), all the iterates are contained in 
a compact convex subset of W^ (namely the initial sublevelset 



{/ : ^(/) < ^(/^)}), and, therefore, we are guaranteed to 
have at least one convergent subsequence of iterates. 

Following the construction in [12], the proof of our global 
convergence result (stated in Theorem 1 below) is best pre- 
sented by first proving a set of supporting lemmas. The proofs 
of these lemmas are inspired by [12] and [43], which consider 
the unconstrained setting only. Many important results hinge 
only on the analysis of the objective values and the acceptance 
rule (30), and can be carried over to the constrained setting 
without any modification. In such cases we only highlight what 
restrictions must be in effect. 

However, some care must be taken in restricting the feasible 
domain to W^ in the proof of Lemma 2. The proof in 
[12] relies on a result in [44], which holds only for open 
sets, not the closed set W^. To show this result in the 
constrained setting, we assume that p is continuous on Mi^. 
This assumption is stronger than the lower semicontinuity 
assumed in the unconstrained case; however it is still satisfied 
by the penalization schemes considered in Section II. 

Lemma 2. Suppose f G R!f: is not critical for (27). Then 
for any a > amin, there exists an e > such that for any 
subsequence {f^^}je'z+ ^ith Mvnj^oo f^^ — fy f^^ ^ I^+ 
for all j G Z+, and amin ^ <^k ^ol, we have 



||^/C,+1||^^||^/C,+1 



r 



> e 



for all j sufficiently large. 



Proof: Assume ||(^^^^^||2 -^ for contradiction, imply- 



ing lim^-^oo/^'^^ 



lim.- 



.{S 



fc,+i 



/^j) = /, where 



fkj+i ^ j^n f^^ ^Yl j G Z+. Now since /^j+^ is critical 
for the subproblem (8), we have 

G VF(/^0 + c.fc/^^'+' +Sp(/^^'+')- 
From the definition of 9p(/^^+^), we have 

p{z) > p{f^+') - [VFify) + ak,6'^^+Yiz - f"'^')- 

for all z G R"^. We now take the limit as j -^ oo. Since 
both p and VF are continuous on R!f:, p{f^^^^) -^ p{f) and 
VF(/^^) -^ VF(7). Additionally, since S^^^^ -^ with ak^ 
bounded, akjS^^~^^ -^ 0, hence for all z G R"^ we have 

p{z)>p{7)-VF{7f{z-7), 

showing G VF(/) -\- dp{f), implying / is a critical point 
for (27), the desired contradiction. Hence we cannot have 

||(5A^j+i||2 ^ as assumed. ■ 

Lemma 3. Given a G (0, 1), then there exists an a > 
such that for any sequence {/*}f=i with /* G W^ for all 
i = TLj^, the acceptance criterion (30) is satisfied whenever 
<^/c ^ 5 = 2L/(1 — cr), where L is the Lipschitz constant for 
VF over IR.^. 

Proof: The proof follows similarly from the proof of 
Lemma 4 in [12], where we only consider /* G R!f:, and hence 
only require F to be Lipschitz continuously differentiable over 

ni. m 
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Lemma 4. The sequence {f^}ke7L^ generated by SPIRAL is 
such that lim/e^oo ^^^^ = 0, and there exists <l> G R such that 
lim/e^oo ^(/^) = '^. 

Proof: The proof follows identically as the proof of 

Lemma 4 in [12] as no modifications are required for the 

constraints within the proof. ■ 

With the above lemmas in hand, we are now ready to state 

the convergence result. 

Theorem 1. Suppose that SPIRAL is applied to (27), where 
(AI) through (A3) hold, then all accumulation points are 
critical points, and hence SPIRAL converges to a minimizer 
of (27). Moreover, the sequence of objective values converges 
sublinearly to the minimal value. Additionally, if F is strongly 
convex, the sequence of objective values converges R-linearly. 

Proof: Using the above lemmas, the proof that the ob- 
jective values converge follows from [12]. Assume for con- 
tradiction that / is an accumulation point that is not critical. 
Let {f^^}je'z+ a subsequence such that Hirj^oo f^^ = /• If 
the sequence {ak^jje'E^ were bounded, we would have from 
Lemma 2 that ||(5^^+^||2 > e for some e > and all j large 
enough. However, this is in contradiction Lemma 4, hence 
{akj}je'z+ must be unbounded. In particular, we must have 
that for some j large enough, ak^ > r] max(amax, 5), meaning 
a = akj /t] >a must have been tried yet failed the acceptance 
criterion, which is prohibited by Lemma 3, since any a > a 
satisfies the acceptance criterion. This further contradiction 
shows that any noncritical point cannot be an accumulation 
point, hence all critical points are accumulation points, and 
due to the convexity of <^, / optimizes (27). 

Following [43], it can be shown that for some constant c > 



*(/') - $(/) < 



k' 



that is the objective values converge at a sublinear rate. 
Moreover, if F is strongly convex, then there exist constants 
C > and r e (0, 1) such that 

hence the objective values exhibit i?-linear convergence. ■ 

E. Uniqueness of the minimizer 

Under the general assumptions (A1)-(A3) in Section IILD, 
the set argminj ^(/) will be a compact convex set (c.f. [42]) 
and may not be the singleton {/}. That is, the solution to 
the minimization (7) may not be unique. The most general 
condition under which the minimizer / is unique is when the 
objective $ is strictly convex, which is obtained when either 
or the penalty pen is strictly convex. In the underdetermined 
case of interest, (j) cannot be strictly convex, since it is trivial 
to find a vector z in the kernel of A, and as a consequence 
F{f + az) = F{f) for any a G R. In this case, a strictly 
convex objective can be obtained if the penalty is strictly 
convex, however most interesting choices for the penalty term 
(e.g., ^i-norm, total variation) preclude this possibility. More 
precisely, the penalty need only be strictly convex on the null 
space of A, however this condition is difficult to verify in 
practice. 



A different approach must then be taken which utilizes 
the precise choice of pen and functional form for F, in 
which conditions for a unique minimizer are established on 
a case-by-case basis. We do not consider the RDP-based 
penalty, since convergence to a global optimum cannot be 
explicitly guaranteed, and therefore only focus on the £i and 
TV penalties. We show for these two penalties that if the 
solution is not unique, then there exist two distinct solutions 
/ and / such that Af = Af and pen(/) = pen(/). This 
means that these solutions are identical in the sense that they 
are equally faithful to the data and equally sparse or smooth 
as jneasured by the penalty. In certain cases, A is such that 
Af = Af only if / = /, hence the solution is unique. 

We begin the analysis by expressing the TV seminorm as 
II/IItv = \\Df\\i, with D = [A; 1^2] where Di and D2 are 
respectively the horizontal and vertical first-order difference 
matrices. Therefore both the ii and TV penalties can be cast 
into the form pen(/) = ||5/||i where B = rW^ for the 
ii penalty, and B = tD for the TV penalty. The KKT 
optimality conditions for a solution / are such that there exists 
a corresponding Lagrange multiplier vector A (not necessarily 
unique) such that, together with /, 



OGVF(/)+Spen(/)-A, / > 0, 



A>0. 



(31) 



= A^/, 

To examine the subdifferential of the penalty, note the compo- 
sition rule that if g{x) = h{Bx), then dg{x) = B^dh{Bx), 
that is if bi are the rows of B, then 



dg{x) = I 






Sibi.Si e 



{dh{Bx))iy 



When h{x) = \\x\\i, the subdifferential is 

{dh{x)). 



sign{xi) if Xi ^ 0, 
J-1,1] ifx, = 0, 
and thus the first KKT condition in (31) is equivalent to 

0G{vF(/)-A + ^sign(6f/)6, + ^8,6, 15, G[-l,l]} 

where S = {i : bf f ^ 0} is the support set of Bf. 

If / is not the unique solution, then there exists another 
solution /, distinct from /, that also satisfies the KKT con- 
ditions with a corresponding A. What we show next is that, 
without loss of generality, we can assume that / and / share 
certain properties that simplifies the proceeding analysis. 

We know that / and / must lie on a^comgact convex set. 
In particular, any point ^(7) = (1 — 7)/ + 7/ with 7 G [0, 1] 
must also be a solution. Define the following index sets: 

S{x) = {i : bfx + 0}, Q{x) = {i:xi^ 0}. 

Now clearly S{z{j)) C S{f) U S{f) and ^(^(7)) C Q(/) u 
Q{f), but as we sweep through 7, the sets S{z{'y)) and 
Q{z{j)) may not remain constant. However they will be 
constant on certain intervals. To see this, define the following: 

Ts = be [0, 1] : bfz{j) = for some i e S{f) U S{f)}, 
TQ = {je [0, 1] : Zi{^) = for some i e Q{f) U Q(/)}, 
T = TsUTq. 
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These sets basically count the number of zero-crossings that 
occur in ^(7), excluding the components that are always zero 
(e.g„ if the line segment between / and / lies on a coordinate 
plane), and hence they are finite sets. Therefore, F = {7^}^! 
for some N < 00, hence there exist open intervals (7^,7^+1) 
on which S{z{'y)) and Q{z{'y)) are^constant. So without loss 
of generality, we can select / and / to be two distinct points 
in one of these intervals, so that we now have S = S{f) = 
S{f) and Q = Q{f) = Q{f)- Furthermore, we will also have 
sign(6f /) = sign(6f /) for all i G S. 

Using the properties established above, and from the first 
KKT condition for both (/, A) and (/, A), we have that 



OG 



{VF(/) 






VF(/)-A + A 

-Si)bi : Si, Si e [-1,1]|. 



From here we multiply componentwise by {f — f) and sum 
to yield 

= (/ - ffiVFif) - VF(/)) + Ta + FX 

where we^have used the complementarity KKT condition that 
f^X = f^X = 0, and the fact thatjf / = bj f = OJor all 
i ^ S. Now since A^ > O^nly^f fi = (similarly A^ > 
only if fi = 0), and s^ince fi = fi = for alH ^ Q, we also 
have that f^X = f^X = 0. Therefore we deduce that 







E 



Vi 



-^(efAf + /3){efAf + , 



"A(f-f) 



and since this is a sum of nonnegative terms (if ^ > 0, the 
fractionjs strictly positive) it can only assume^ zero value if 
^{f ~ f) = 0- This means that if both / and / are solutions, 
they need to be equivalent up to their projections. Since they 
are both minimizers of the objective, they necessarily have 
identical objective values, and since they both achieve the same 
value for F, it is clear that pen(/) = pen(/). Hence the 
solutions are also equally sparse or smooth with respect to the 
penalty. 

In the case of the £1 norm, we can be a bit more precise 
by considering the fact that we often examine the solution 
in terms of the coefficients = W^ f. In this case, if AW 
satisfies a restricted-isometry type property [2] for vectors of 
sparsity s = 2|S'|^weJiave {l-Ss)\\0-e\l2 < pW(^-6>)||2, 
and hence AW{0 — 0) = only if — = 0, contradicting 
the distinctness of 6 and 6, hence if AW satisfies RIP, the 
solution is unique. 

IV. Numerical experiments 
A. Simulation Setup 

Although the algorithms described heretofore are applicable 
to a wide range of imaging contexts, here we demonstrate 
the effectiveness of the proposed methods on a simulated 
limited-angle emission tomography dataset. We compare our 
algorithm with the currently available state-of-the-art emission 
tomography reconstruction algorithms [31]. 

In this experimental setup, we wish to reconstruct the true 
axial emission map /^ (Fig. 2(a)) as accurately as possible. 



3) CD 




(a) 



(b) 



(c) 



(d) 



Fig. 2: Experimental setup: (a) true emission image, (b) 
attenuation map, (c) noisy projection data, (d) estimate used 
as initialization. 



The photon flux described by this emission map is subject 
to the attenuation effects caused by the various densities of 
tissue through which the photons must travel to reach the 
detector array. The simulated attenuation map /i (Fig. 2(b)) 
is assumed known during the reconstruction process. The 
simulated emission and attenuation images are standard test 
images included in the Image Reconstruction Toolbox (IRT) 
by Fessler [45]. 

The limited-angle tomographic projection R, corresponding 
to parallel strip-integral geometry with 128 radial samples and 
128 angular samples spaced uniformly over 135 degrees, was 
also generated by the IRT software [45]. The resulting sensing 
matrix is then given hy A = diag[exp(— i?/i)]i?, with the noisy 
tomographic data y simulated according to the inhomogeneous 
Poisson process (1). We simulated ten realizations of the data 
y in order to examine the ten-trial average performance of all 
the reconstruction methods presented. We only show images 
reconstructed using a particular realization of the data shown 
in Fig. 2(c)). In this case, the noisy sinogram observations 
have a total photon count of 2.0 x 10^, a mean count over 
the support of the tomographic projections of 18.08, and a 
maximum count of 44. 

B. Algorithm Setup 

We evaluate the proposed SPIRAL approaches (£1, TV, 
RDP, and RDP-TI), two competing Poisson reconstruction 
methods [31], [45], and the SpaRSA algorithm [12]. For the 
SPIRAL-^i method, the Daubechies family (DB-2 through 
DB-8) were evaluated as the basis W. Here we only present 
results using the DB-6 wavelet basis as the results were similar 
across bases with DB-6 having a marginal lead in performance. 
Using the SPIRAL-TV method, we investigate the effects of 
not enforcing the near-monotonicity acceptance criterion (13). 
For all methods that enforce (13), we used 7^ = 2, a = 0.1, 
M = 10. Also, as described in Section III-C, we evaluate how 
the accuracy of solving the subproblem impacts the global per- 
formance of the SPIRAL-^i and TV methods. When solving 
these subproblems stringently (denoted Tight in the figures and 
tables), a minimum of 10 and a maximum of 100 iterations 
were used, with a convergence tolerance tolsuB = 1 x 10~^; 
these parameters were relaxed to a maximum of 10 iterations, 
and a tolerance tolsuB = 1 x 10~^ when the subproblems 
are solved less exactly (denoted Loose). In all the SPIRAL 
methods, we set a^ax = l/<^min = 1 x 10^^. 

We compare our proposed approaches with two competing 
Poisson reconstruction methods. The first, denoted SPS-OS, 
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(a) Ground Truth (b) SPIRAL-TV (c) SPIRAL-TV (d) SPIRAL-TV (e) SPIRAL-TV (f) SPIRAL-RDP (g) SPIRAL-RDP-TI 

Loose Monotonic Loose Nonmonotonic Tight Monotonia Tight Nonmonotonic (Translation Variant) (Cycle-Spun) 

RMSE = 24.404% RMSE = 24.962% RMSE = 24.526% RMSE = 24.467% RMSE = 33.959% RMSE = 27.557% 



vi)(Da^il>fX»CDi2 



(h) SPIRAL-^i 

Loose (DB-6) 

RMSE = 28.626% 



(i) SPIRAL-^i 

Tight (DB-6) 

RMSE = 28.665% 



G) SpaRSA 

h-h (DB-6) 

RMSE = 31.172% 



(k) SPS-OS 
(Huber potential) 
RMSE = 27.555% 



(1) SPS-OS 

(Quadratic Potential) 

RMSE = 29.420% 



(m) EPL-INC-3 
(Huber Potential) 
RMSE = 24.748% 



(n) EPL-INC-3 

(Quadratic Potential) 

RMSE = 26.474% 



Fig. 3: Single-trial reconstructed images for all methods considered. Note RMSE (%) = 100 • ||/ — /^112/11/^112- 



uses a separable paraboloidal surrogate with ordered subsets 
algorithm [31]. The second, denoted EPL-INC-3, employs 
an incremental penalized Poisson likelihood EM algorithm 
and was suggested by Prof. Fessler as representative of the 
current state-of-the-art in emission tomographic reconstruc- 
tion. Both of these methods are available as part of the IRT 
[45]; specifically, we used the pwls_sps_os and epl_inc 
functions from the toolbox. In addition to these Poisson 
methods, we also compare to the SpaRSA algorithm [12] 
which solves the ^i -regularized least-squares (^2-^1) problem. 
Like the SPIRAL-^i approaches, we only present results when 
W is the DB-6 wavelet basis. As the solution provided by 
SpaRSA is not guaranteed to be nonnegative, we threshold 
the result to obtain a feasible - and therefore more accurate 
- solution. Including this result allows us to demonstrate the 
effectiveness of solving the formulation (7) that utilizes the 
Poisson likelihood. 

All of the methods considered here were initialized with the 
estimate shown in Fig. 2(d). This initialization results from two 
iterations of a non-convergent version of the EPL-INC-3 algo- 
rithm, itself initialized by the filtered back-projection estimate. 
All algorithms executed for a minimum of 50 iterations, and 
global convergence was declared when the relative change in 
the iterates (24) fell below tolP = 5 x 10""^. 

Lastly, in all of the experiments presented in this paper, we 
chose any parameters associated with each algorithm (such 
as^r) to minimize the RMS error (RMSE (%) = 100 • 
11/ — /"^ II 2/ 11/"^ II 2) of the reconstruction. While this would 
not be possible in practice, it does allow us to compare the 
best-case performance of various algorithms and penalization 
methods. In practical settings, regularization parameters can 
be chosen via cross-validation. This is particularly well-suited 
to many photon-limited imaging applications in which each 
detected photon has a time stamp associated with it; this timing 
information can be used to construct multiple independent and 
identically distributed realizations of the underlying Poisson 
process in software. The details of this procedure are a 
significant component of our ongoing research. 



C Results analysis 

From the results presented in Table I, we see that out 
of the proposed SPIRAL approaches, the method utilizing 
the total variation penalization achieves the lowest RMSE 
and highest visual quality, followed by SPIRAL-RDP-TI and 
SPIRAL-£i. This bolsters the notion that there is much to 
be gained by considering additional image structure beyond 
that of a sparse representation in the basis W. The SPIRAL- 
RDP method based on the non-cycle-spun partitions simply 
are not competitive due to the high bias in considering only a 
single shift of the RDP structure. Examining Fig. 3(f), we see 
that this bias is manifest in the image as blocking artifacts 
due to the RDP structure not fortuitously aligning to any 
strong edges in the image. Also from Table I we see that 
the EPL-INC-3 method with the Huber potential offers the 
toughest competition to the RMSE achieved by the SPIRAL- 
TV approaches. 

Examining the reconstructed images in Fig. 3, we see 
that the visual fidelity correlates strongly with the RMSE. 
The SPIRAL-TV and EPL-INC-3 results clearly do best at 





Single-Trial 


Ten-Trial Average 


Method 


RMSE (%) 


Time (s) 


RMSE (%) 


Time (s) 


SPS-OS Huber 


27.555 


9.078 


27.057 


9.094 


SPS-OS Quad 


29.420 


5.354 


29.049 


5.574 


EPL-INC-3 Huber 


24.748 


20.302 


24.462 


16.704 


EPL-INC-3 Quad 


26.474 


8.713 


26.013 


9.647 


SpaRSA 


31.172 


5.946 


29.987 


3.730 


SPIRAL-^i (L) 


28.626 


6.933 


28.050 


5.396 


SPIRAL-^i (T) 


28.665 


15.547 


27.980 


20.006 


SPIRAL-TV (L, M) 


24.404 


15.418 


24.270 


10.102 


SPIRAL-TV (L, NM) 


24.962 


7.821 


24.868 


4.721 


SPIRAL-TV (T, M) 


24.526 


25.423 


24.352 


20.900 


SPIRAL-TV (T, NM) 


24.467 


21.505 


24.571 


15.046 


SPIRAL-RDP 


33.959 


3.118 


34.946 


2.323 


SPIRAL-RDP-TI 


27.557 


5.313 


27.669 


4.756 



TABLE I: Reconstruction RMSE and computation time for 
both the single-trial results, and results averaged over ten 
trials. Note: L = Loose, T = Tight, M = Monotonic, NM = 
Nonmonotonic, RMSE (%) = 100 • ||/ - /*||2/||/'^||2. 
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Intensitiy Cross Section 



RMSE vs. CPU Time 
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Fig. 4: Intensity cross sections of the best-in-class methods: 
EPL-INC-3 with Ruber potential, SPIRAL-^i with loose sub- 
problem convergence criteria, monotonic SPIRAL-TV with 
loose subproblem convergence criteria, and SPIRAL-RDP-TI. 
The insert image shows the location of the cross section. Note 
that the pixel boundaries in the image cause the stepwise 
nature of the profiles. 



capturing strong edges in the image while smoothing noise in 
the homogeneous regions, with the SPIRAL-TV method more 
correctly smoothing homogeneous regions. This conclusion 
is supported by Fig. 4 in which we further analyze the 
best-in-class SPIRAL and EPL-INC-3 methods. Between the 
EPL-INC-3 methods, the Ruber potential is more effective 
at reducing the rough noise-like deterioration that is quite 
pronounced when using the quadratic potential. Rere we see 
that both SPIRAL-TV and EPL-INC-3 results have sharp 
edges, but significant variations in the regions of homogeneous 
intensity are a consistent source of error for the EPL-INC-3 
method. These variations cause many darker spots that could 
be misinterpreted as regions of low uptake. Figures 3 and 4 
both show that the SPIRAL-RDP-TI and ^i -based approaches 
tend to over-smooth the entire image; however the SPIRAL- 
RDP-TI method better captures the high intensity regions. 
Righ-scale wavelet artifacts are seen throughout the SPIRAL- 
£i and SpaRSA reconstructions. In the SpaRSA results shown 
in this section, we clipped the final reconstruction to be 
nonnegative. Even after this clipping operation, the RMSE 
of the final result was significantly higher than the RMSE 
associated with methods using the Poisson log-likelihood. 
This suggests that the effort required to use the Poisson 
log-likelihood is justified by tangible performance gains over 
methods associated with a penalized least-squares objective. 
Lastly, prominent streaking artifacts in the SPS-OS methods 
hinder their accuracy, with using a Ruber potential resulting 
in a slightly better performance than a quadratic potential. 

While accuracy and visual quality are held paramount in to- 
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Fig. 5: Convergence of the RMSE vs computation time for the 
best-in-class methods. The termination criteria was reached at 
the point indicated by the solid circle. 

mographic reconstruction, the computational cost and conver- 
gence behavior of the chosen methods should not be neglected. 
Figure 5 shows the convergence behavior of the best-in-class 
methods. Although the SPIRAL-£i and RDP-TI approaches 
yield higher RMSE, they show desirable convergence behavior 
with sharp initial decreases in RMSE and termination in under 
eight seconds. The SPIRAL-TV method also shows this sharp 
decrease, but exhibits a prolonged period where the RMSE 
changes little. The EPL-INC-3 method shows none of these 
traits, slowly changing the RMSE throughout its execution 
until it finally reaches the termination criteria after 20 seconds. 
Next consider Fig. 6, where we show the RMSE conver- 
gence for the variations of the SPIRAL-TV algorithm. Com- 
paring the top and bottom axes, we see that by not enforcing 
the near-monotonicity acceptance criteria in (13) accelerates 
the convergence rate at the potential cost of large increases 
of the RMSE during the execution of the algorithm (seen at 
the near the two second mark when tightly solving the TV 
subproblem). Although the nonmonotonic algorithms typically 
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Fig. 6: Comparison of the RMSE convergence rates for the 
different variations of the SPIRAL-TV method. The termina- 
tion criteria was reached at the point indicated by the solid 
circle. 
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Fig. 7: Comparison of the RMSE convergence rates for the 
different variations of the SPIRAL-^i method. The termination 
criteria was reached at the point indicated by the soHd circle. 

yield early sharp decreases in RMSE, the ten-trial average 
behavior in Table I shows that the monotonic algorithm will 
achieve a more accurate solution at termination. This suggests 
that a two- stage approach of starting with a nonmonotonic 
algorithm to achieve quick, yet approximate, convergence then 
switching to the monotonic algorithm may be the best overall 
approach. 

Figures 6 and 7 support the use of relaxed convergence 
criteria for the TV and li subproblems. Table I also shows 
that little is lost in terms of final RMSE by using this less- 
stringent criterion. In fact, by combining the nonmonotonic 
option and loose subproblem convergence criteria results in 
a SPIRAL-TV approach that, on average, nearly matches the 
best EPL-INC-3 method in terms of RMSE, yet is approxi- 
mately four times faster. Further, this algorithm is even faster 
than the least accurate SPS-OS method. Therefore, we believe 
that further investigation of nonmonotonic and approximate 
subproblem algorithms is a fruitful area of research for fast 
and accurate image reconstruction. We finish the discussion 
of the numerical results by directing the interested reader to 
additional experimental results in [7], [46] which demonstrate 
the effectiveness of the proposed SPIRAL method in a com- 
pressed sensing context. 

V. Conclusion 

We have formulated the general goal of reconstructing 
an image from photon-limited measurements as a penalized 
maximum Poisson likelihood estimation problem. To obtain 
a solution to this problem, we have proposed an algorithm 
that allows for a flexible choice of penalization methods, 
and focused particularly on sparsity-promoting penalties. In 
particular, we detail the cases where the penalty corresponds 
to the sparsity-promoting li norm of the expansion coefficients 
in a sparsifying basis, is related to the complexity of a 
partition-based estimate, or is proportional to the total variation 
of the image. We establish mild conditions for which this 
algorithm has desirable convergence properties, although in 
practice it is beneficial to relax these conditions to attain 
faster (albeit nonmonotonic) convergence. We demonstrate the 
effectiveness of our methods through a simulated emission 
tomography example. When our total variation regularized 
method is applied to this problem, the resulting estimates 



outperform the current state-of-the-art approaches developed 
specifically for emission tomography. In particular, it results in 
fewer spurious artifacts than wavelet-regularized methods, and 
unlike partition-regularized methods, is the result of a convex 
optimization procedure. 
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